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Abstract This paper describes an efficient EM algorithm for maximum likelihood estima- 
tion of a system of nonlinear structural equations corresponding to a directed acyclic graph 
model that can contain an arbitrary number of latent variables. The endogenous variables in 
the model must be categorical, while the exogenous variables may be arbitrary. The models 
discussed in this paper are an extended version of finite mixture models suitable for causal 
inference. An application to the problem of education transmission is presented as an illus- 
tration. 
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1 Introduction 

Structural equation models (SEM) are defined by a system of nonlinear equations specify- 
ing which variables have a direct causal effect on each endogenous variable in the system. 
A recursive non parametric SEM is equivalent t o a directed a cyclic graph (DAG) and, also, 
to a set of conditional independence statements. IPearll Jl995h has shown that, under certain 
conditions, (the back-door and the front-door criteria) causal effects can be estimated from 
the frequency distribution of the observed variables; these conditions are, however, rather re- 
strictive and are difficult to combine with statistical modeling assumptions. In this paper we 
restrict attention to models where the full joint distribution of observed and latent variables 
is identified and we describe an efficient algorithm for maximum likelihood estimation; cer- 
tain routine s of this algorithm may also be used to compute natural direct causal effects, 
IPearll J2010h . 
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The class of models considered in this paper may be seen as an extension of latent 
class models in the sense that observable variables need not be independent conditionally 
on the latent ones. In addition, an observable variable may have a direct effect on a latent 
one and a latent variable may have a direct effect on a n other latent whi ch is conceptually 
distinct. These models are not entirely new, for example lHagenaarsl ( f2002h has considered an 
application to a social science context of a model which is a special case of those considered 
here. The class of mixture models considered bv lAlfo and Trovatol l20lTh may be seen as 
a special case of those studied here, relative to the dependence structure; a more detailed 
discussion will be given in section [2~2l 

We present an application in the context of education transmission, a much debated issue 
in Econometrics and Labor Economics. In order to assess the causal effect of the education 
of the parents on that of their child, one needs to control for the latent endowments of the 
parents and that of the child, which are likely to be strongly associated. The approach we 
propose is based on estimating a recursive system of structural equations where the natural 
endowment of parents and child are treated as two latent endogenous variables; this, we 
believe, provides an innovative contribution to the existing literature on the subject which 
we review briefly in Section 5. 

The class of models studied in this paper are defined in section 2 where we examine 
the relationship with related models. The computation of maximum likelihood estimates 
and their implementation are discussed in section 3, an approach to the evaluation of causal 
effects is presented in section 4 and the application to education transmission is presented 
in section 5. 



2 A class of semi-parametric structural equation models 

We recall, following [Pearll ll2000h . that a non parametric recursive structural equation model 
is a system of equations in the variables Z\, . . . ,Z„ 

Z i = f i (pa i ,E i ), i=\,...,n (1) 

where pa-, is the subset of variables which are assumed to be the direct causes of Z,-, these are 
usually called parents, and E\ ,...,£„ is a set of independent background exogenous variables 
which account for all residual effects. The fact that the system is recursive implies that, if Z/, 
is a parent of Z,; then h < i. The system is non-parametric in the sense that the distribution 
of the and the form of the functions /, do not need to be specified. Such a system is 
equivalent to a causal DAG where endogenous variables are represented by nodes and there 
is an arrow from Z/, to Z if Z/ 2 is a direct cause of Z,, that is if Zh G pa,-. A convenient 
property of causal DAGs is that the joint distribution may be factorized into the product of 
the conditional distribution of each node given its parents. A DAG can contain one or more 
latent nodes, for example in the case of education transmission discussed in section [5] the 
unobservable endowments of the parents and that of the child are supposed to affect the 
educational achievements of the latter. 

The methodology described in this paper is applicable when endogenous variables, ob- 
served or latent, are categorical. Our models differ from non parametric SEM because, when 
a variable is assumed to depend on two or more other variables, we allow some of these ef- 
fects to be additive on a logit scale appropriate to the nature of the response variable under 
consideration. Essentially, logits of type reference category or adjacent are more appro- 
priate when response categories are not ordered, logits of type global are preferable when 
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response categories are ordered and logits of type continuatio n are more suitable when re- 
sponse categories correspond to survival or achievements, see IColombi and Forcinal J200lh 
for a detailed discussion. If Z; has categories coded as 0, 1 , . . . , c,- — 1 , the ith structural equa- 
tion has c,- — 1 components, one for each logit of Z,- and, in the special case when the effects 
of its parents are additive, the /ith logit (h = 1 , . . . , c, — 1) may be written as 

a* = EAw+ E £a v ,/(z, >/), (2) 

1=1 Zjepai 1=1 

where I(Zj > I) is the indicator function. Note that we have used the incremental coding for 
the jS s, this means that, for instance /3,o/j is the difference in the intercepts of the h and h — 1 
logits for Zj. The reconstruction formulas for the case of logits global and adjacent, the only 
types used in this paper, are given below 

(g):P(Z,=/ 3 ) - eXp( ^ } eXp(A ''' ! - l} 



1 + exp (Xi h ) 1 + exp ( Xi t /,_ 1 ) 



(«):P(2i = A)- eXp(I ' LlA " } 



From the software point of view, any model of our class is determined by the following 
specifications: 

- An ordered list of the endogenous variables such that, if there is an arrow from Z; to Zj, 
then Z, comes before Zj ; 

- A binary indicator specifying which variables, among the endogenous ones, are latent; 

- For each endogenous variable, the list of its parents; 

- For each node, the corresponding link function; this is determined by the number of 
categories of the node variable and the type of logit (adjacent, global, continuation) 
which determines how its conditional distribution is parameterized; 

- For each endogenous variable, a regression model which specifies how its logits depend 
on the parents and, possibly, on additional exogenous variables measured at the level of 
statistical units; this is determined by a design matrix for each response variable . 



2.1 Identifiability 

Identifiability results for latent class models under conditional independence are by now well 
established. Recent results bv lAllmane7ail l l2009r) can handle several extended latent class 
models where certain subsets of the observable variables may be associated conditionally to 
the latent. Though, to our knowledge, no results are available to determine whether a general 
DAG with an arbit r ary nu mber of latent variables is identifiable, the numerical method de- 
scribed bv lForcinal d2008l) can be used to determine whether a given model is locally identi- 
fiable with very high probability everywhere in the parameter space; this approach was used 
in the application. Essentially, the methods samples points from the parameter space and 
checks whether the jacobian matrix obtained by differentiating the log-linear parameters of 
the saturated model for the joint distribution of the observable variables with respect to the 
actual parameters of the model is well away from being singular. 

Typical modeling restrictions that might be used to achieve identifiability are assump- 
tions of additivity within a given link function, like, for example, a multivariate logistic 
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function. Continuous covariates may be included as exogenous variables; these are the vari- 
ables determined outside the system so that there is no equation that describes their behavior. 
Clearly, when continuous covariates are available, a linear regression model within the as- 
sumed link function must be used. 



2.2 Discussion 

An int eresting instance of the models described above was used informally by lHagenaarsI 
d2002h as an extended latent class model. It may be interesting to note that, while in a basic 
latent class model the parameters which determine the marginal distribution of the latent are 
somehow separate from those which determine the conditional distribution of the responses, 
in the general context described here, in principle, any node of the DAG may correspond to a 
latent variable and, if there is a latent node Z, which has no parents, its marginal distribution 
is determined by the /3,o/,, the intercept parameters for the adjacent logits, whose number 
equals the number of latent categories minus 1 . 

A different, but closely relat ed literature is that base d on finite mixture models, like 
those developed, for instance, in lAlfo and Trovatol d201ll) where a selection variable and 
two or more response variables are assumed to depend on a multivariate continuous latent 
distribution. However, when the underlying distribution is approximated with a discrete dis- 
tribution with K support points, the resulting model is equivalent to a DAG model with a 
single discrete latent variable, say U; the special case where there are two responses Y\, I2 
and a selection variable 7q is displayed in the DAG below 



It is worth noting that the true multivariate nature of the underlying latent, once turned 
into a discrete one, should show up in the values of the estimated intercept parameters jS,//, 
where i indexes the response variable, j the latent and / the category of the latent; the fact 
that Piji is positive (or negative) for all i, I indicates that the underlying latent is essentially 
uni-dimensional. 



3 Maximum likelihood estimation 

Under the assumption that, conditionally on exogenous covariates, the joint distribution of 
the variables (both the observable and the latent ones) in the DAG is multinomial, any iden- 
tifiable model may be fitted by an EM algorithm. In the E-step we update the hypothetical 
latent distribution on the basis of the posterior probabilities that the subjects with a given 
observed response profile belong to each possible latent configuration and in the M-step we 
maximize the multinomial likelihood of the latent distribution. 

In spite of the rather complex framework, the E-step has the familiar form of the product 
of the observed frequencies times the estimated posterior probabilities. Let ftudi) denote the 
probability of belonging to latent configuration h conditionally on having observed configu- 
ration j for the ;th unit, where j and h denote, respectively, a given cell of the observed and 
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latent frequency table; let also Nj(i) denotes the observed frequency in cell j for the jth unit, 
the reconstructed frequency table is given by 

Mj. h (i)=Nj(i) " 

Due to the recursive nature of this class of models, the M-step may be performed by maxi- 
mizing the conditional likelihood of each endogenous variable conditionally on its parents 
and on exogenous variables. An efficient alg orithm for fitting these generalized logistic mod- 
els is described in lEvans and Forcfnal d2012h . section 4. 

Though the theory required to implement the EM algorithm to our models is straight- 
forward, the difficulty lies in setting up a software that can perform these tasks efficiently 
having as input a general DAG with an arbitrary number of latent variables. Essentially, in 
the E-step we first need to compute the marginal probability distribution of the observed 
variables and then expand this back into the joint distribution while, in the M-step, we first 
need to compute, for each node, the conditional distribution of the response variable given 
its parents and, at the end, reconstruct the joint distribution recursively. The basic idea is 
to arrange probabilities and frequencies in lexicographic order so that the categories of Zj 
run faster than those of Z; if j > i. Marginal distributions are computed by first rearrang- 
ing entries into a two-way table where the variables to be retained are by column and then 
summing across rows. Expansion of a smaller table into a larger one is performed first by 
replicating each entry a number of times equal to the number of cells of the omitted variables 
and then rearranging entries according to the original ordering of variables. Rearrangement 
of cells are performed by suitable indices which are constructed before starting the algo- 
rithm. The MatLab functions that implement the EM algorithm on a general DAG will be 
made available as supplementary material. 

To start the algorithm, an initial E-step is performed by assuming the the posterior prob- 
abilities 7fh\j(i) are uniform, except for a small random perturbation. In the initial M-step a 
one-step ahead logistic model is fitted and estimates are adjusted to smooth possibly large 
absolute values. In this way an initial estimate of the latent distribution is obtained. With 
some expertise, the models described in this paper could also be fitted with the LG-Syntax 
module described bv lVermunt and Magidsonl 1 2008h 



The methodology described by 



Bartolucci and Forcinal d2006h . section 3.3, was used 



to compute standard errors of the parameter estimates from the estimate of the expected 
information matrix. The idea is to collect all parameters into the vector f5, to compute the 
score vector of the log-likelihood for the observed distribution by the chain rule and the 
information matrix as follows 

dy de'dp" ^/n), 

where y is the vector of log-linear parameters for the saturated log-linear model of the ob- 
served distribution, 8 is the vector of log-linear parameters for the latent distribution and F is 
the expected information matrix. The extension of this procedure to a general DAG model is 
a rather complex task which is handled by specific routines which exploit the rearrangement 
indices mentioned above. 



4 Evaluation of causal effects 



In this paper we formulate the questions of interest and comp ute appropri ate answers within 
the formal language developed by J. Pearl (see for example IPear Chapter 3) which 
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we summarize briefly below. It may be useful to note that, in Pearl's framework, the joint 
distribution of the observed variables in the DAG is assumed to be known, or estimated 
from observed frequencies, and the formal language is required to evaluate causal effects by 
taking into proper account the causal relations described by the DAG. The fact that certain 
variables are or are not endogenous, is irrelevant when we estimate the statistical model, 
as long as the conditional independencies implies by the DAG are true. However, while in 
a non-parametric context certain causal effects may not be estimable from the joint distri- 
bution of the observable variables, in our semi-parametric framework, once the statistical 
model is identifiable, any causal effect of interest may be easily computed from the esti- 
mated latent distribution. 

In a structural equation model, see equation |Q]|, we may evaluate the causal effect of a 
subset of variables X = (Z,) ie / on Y = (Zj)j Ej , with J disjoint from /, by first applying to 
the "do operator" 

P(y\do(x))= £ P(z h ...,z„\do(x)), 

i0\JJ 

this is equivalent to determine the distribution that would arise if we could perform an ideal- 
ized experiment where the variables in X were randomized. Once the intervention distribu- 
tion has been constructed, we need to choose how to compare distributions of Y for different 
values of x: the two most obvious alternatives are differences or ratios of the relevant proba- 
bilities. Because in the application we deal with ordered categorical distributions, we simply 
compute the ratio of the corresponding survival probabilities. 

4. 1 Direct effects 

In a complex DAG causal effects may act through several different pathways, and we may 
be interested in assessing the effects that act along certain specific paths. Consider, for in- 
stance, the model described in Table[TJpresented in section 5. There, S p (parents' education) 
affects S c (child education) directly, or by affecting U c (child latent endowment) or Y (fam- 
ily income) which, in turn, affect S c . The effect of U p (parents' latent endowment) travels 
through many channels, but we would mainly be interested in its effect on S c while observed 
family backgrounds is held fixed, to capture the effect of natural inheritance, that is the path 
from Up to S c going through U c . 

Effects exerted through specific paths are called 'direct effects'. In the literature different 
definitions of direct effects have been considered; the one used i n our applic ation is the 
'natural direct effect ' which is defined as follows (see for example IPearll d2000h Definition 
4.5.1 or IPearll d2010h section 6.1.3). Suppose we are interested in the causal effect of a set 
of variables X on Y exerted through all paths except those going through a set of mediating 
variables M = (Z/)/ e ^, with K disjoint from /,/. Then, first we computes the intervention 
distribution obtained by setting X = x and M = m 

P(y\do(x),do(m))= £ P(zi, .. .,z n | do(x),do{m)) ; 

i^IUJUK 

the effect of M is then averaged out, with weights provided by the distribution of M when X 
is set to its reference category by intervention. 

Computation of direct effects requires the computation of several intervention distribu- 
tions, a task that is similar to the one implemented within the EM algorithm described above 
to reconstruct the joint distribution. In practice, the basic ingredients are the DAG structure 
and, for each node, the estimated conditional distribution given its parents. Then, nodes are 
processed one at a time to reconstruct the required intervention distribution. 
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5 Application to Education transmission 

The question of assessing the effect of raising the education of the parents by policy in- 
tervention on the education of their children is difficult because the answer depend on the 
extent to which the association between parents' and children's education is due to the trans- 
mission of unobservable endowments across generations. 



5.1 Background and Literature 

For simplicity, consider the very simple model in the four variables S P ,U P and S C ,U C , which 
denote schooling and unobservable endowments respectively for parents and child, while 
e p , e c are exogenous errors and assume that 

S c = f(S p ,U c ,e c ) (3) 
U c = g{S p ,U p ,e p ). (4) 

This model says that a child's education depends on her own endowment and her parents' 
education, and in turn the child's endowment depends on her parents' schooling and en- 
dowment. Under this model the observed association between S p and S c is partly due to the 
effect of endowment on schooling within each generation combined with the transmission 
effect from U p to U c . Thus the stronger the endowment transmission effect the weaker the 
scope of education policy. One could substitute from equation $4$ into (f3]l to get the reduced 
form equation 

sr=f{s p ,u p ,e) (5) 

which requires controlling only for parents' endowment. Th ree main approaches in this di- 
rection have been pursued. iBehrman and Rosenzweid J2002h take differences between sub- 
jects with twin mothers, having adju sted for asso rtative mating in order to control for differ- 
ences between education of fathers; uses data on adoptees un der the assump- 
tion th at there should be no endowment transmission, although, as noted bv lHolmlund et al 



2011 1. association may be induced by selective placement of adoptees. Finally. iBlack et a] 



20051 ) analyze a dataset where differences in parent's education was exogenously induced 



by reforms in municipal schooling l aws which they used as an instrument. For a critical 
assessment see lHolmlund et ail d20 1 lh who apply the three methods to a single data set and 
show that they produce conflicting results. 

Alternatively one could estimate equation <[3j> in isolation, which requires controlling 
only for the child's endowment. By fitting a much more complex version of yjl, Cameron 
and Heckman (1998) address the issue of how the family background affects the probabil- 
ity of transition from one grade of education to the next. Though their model resembles 
<[3j> the heterogeneity is assumed independent from the observed covariates, so it could be 
interpreted as the component of U c which is not determined by family background. 

The variable U p , named family endowment, is essentially identified by the variables it 
affects, so it is meant to capture the family environment in which children grow up. It is in 
principle a cross classification of various characteristics of the family, but in practice it turns 
out to be naturally ordered in a scale of 'quality'. The child's unobservable U c is identified 
mainly through cognitive and non-cognitive test scores, so it is not to be interpreted as 
strictly reflecting an individual intrinsic endowment; it is rather a mixture of this and other 
unobservables like motivation and acquired knowledge useful for schooling advancement. 
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5.2 Data 

We use data from the National Child Development Survey (NCDS), produced by a UK co- 
hort study targeting the population born in the UK between the 3rd to the 9th of March 
1958. Individuals were surveyed at different stages of their life and information on their 
schooling achievement, various tests results and family background was collected. A com- 
plete description of the data is available at 
http : //www . esds . ac . uk/longitudinal/access/ncds. 

Some variables are inherently discrete (notably schooling level) while others would be 
more naturally described as continuous, like income and test scores. Because the finite mix- 
ture model approach used in this paper can be applied only when all endogenous variables 
are categorical, continuous variables were turned into discrete. Though clearly a continu- 
ous variable contains more information relative to a discrete approximation, there are two 
reason why a model based on categorical variables may involve less parametric restrictions 
than one based on the original continuous measurements. First, a continuous variable used 
as explanatory in a regression model implies linearity unless additional polynomial terms 
are introduced; instead, once it has been transformed into a set of a dummy variables corre- 
sponding to discrete categories, it can capture patterns of non linearity in a non parametric 
way. Models involving a continuous variable as response are usually based on the rather 
restrictive assumption of normality while, when used as categorical, the discrete distribution 
is assumed to be multinomial, that is completely unrestricted, at least in the first stage. 

The original sample contains 18560 observations, but more than 80% have at least a 
missing entry. Incompleteness is scattered across many variables included in the survey. The 
subsample of complete data which we analyze amounts to almost 3000 subjects, 1471 males 
(sons) and 1330 females (daughters). The marginal distributions of the summary statistics 
for the most relevant variables in the complete-case sub-sample do not differ significantly 
from the same distributions in the whole sample, but we cannot really exclude selection bias. 
Our main dependent variable S c is the amount of education achieved by each individual, 
which takes four levels: no qualification, O-level, A-level and higher education. 

Children are tested at the age of 7 and 1 1 for mathematics, reading and non-cognitive 
skills, and again at 16 for math and reading, and we use the test scores for identification of 
the unobservable endowment. More specifically, after taking principal components (which 
in all cases explain no less than 90% of the total variance) for math and reading we combine 
scores at 7 and 11 into two ordered variables: EM and ER. Math and reading scores at 
16 are coded in two additional variables LM and LR. For non-cognitive skills (available 
at ages 7 and 11), principal components yields two factors; these were averaged and then 
dichotomized at the median into the binary variable NC. 

Parents' schooling is defined as the age at which they left school (12 to 21 years); for 
each parent we extract a three level variable corresponding to significant educational steps: 
leaving up to 14 years of age; after 14 but not later than 16; after 16; these are called 
S m , S-f for mother and father respectively. As usual there are many missing data on fam- 
ily income; to alleviate the problem, since few mothers in the dataset have an income, we 
neglect mother's income (thus avoiding to drop data with missing mother's income) and 
concentrate on fathers'. We group their income in three categories into the ordered variable 
Y. 

The NCDS contains also information on parents' interest in their children's education, 
as reported by teachers; this turns out to be an important variable; it should measure the 
amount of effort or concern, and, perhaps, is related to the value that family gives to the 
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child's education. Parents' interest is originally classified into as many as 5 categories; we 
extract three binary parents' interest variables, I 1 . 1 , 1 16 . 

6 The Model 

We estimate a system of equations which is an extended and highly complex version of OJ 
and ©; because of its complexity, it is convenient to summarize the basic features of the 
model in table Q] below where, for each variable in the DAG we give the number of cate- 
gories, the type of logit (g for global and a for adjacent) and the list of parents. Note that 

Table 1 Description of the model 



i Z; n.cat. logit pcij 



1 




3 


a 




2 


f 


2 


S 


UP 


3 


/" 


2 


g 


UP 


4 


I 16 


2 


g 


L"' 


5 


s m 


3 


g 


U'' 


6 


s-i 


3 


g 


UP,S m 


7 


Y 


3 


g 


UP,S m ,Sf 


8 


U° 


3 


CI 


UP > I 7 } I n > ,I l6 S m ,S f 


9 


EM 


3 


g 


u c 


10 


LM 


3 


g 


U c , EM 


11 


ER 


3 


g 


U c 


12 


LK 


3 


g 


U C ,ER 


13 


NC 


2 


a 


U c 


14 


f 


4 


g 


S"\ S?, Y, U c 



there is an arrow from S' n to S-t to account for assortative mating. In the fitted model the 
dependence of each node on its parents is assumed linear on the appropriate logit link trans- 
formation. In particular, because all observable variables in the system are naturally ordered, 
we use cumulative (or 'global') logits. The levels of unobservable variables, instead, are as- 
sumed to correspond to unordered qualitative types, so we use adjacent logits . Separate 
models were fitted for daughters and sons to account for gender effects. 

6. 1 Main Estimation Results 

In Table[2]we display some of the most relevant parameter estimates from different structural 
equations included in the model which we fitted to data on sons and daughters separately.. 
First of all note that all the jS,i/, parameters are negative and usually significant, this indicates 
that the three parents' latent class may be ordered from best to worst; the only exception are 
the jS8i/,s which, being positive, indicate that the child's latent class may also be ordered 
from best to worst. This is in agrement with the fact that all the other /3s are negative, indi- 
cating that increasing rearing efforts and higher education on the parents' side are positively 
associated with an improvement of the latent endowment of the child. The few displayed 
parameters from the equations for early and late score in math indicate that better endowed 
children get better score and that performances are correlated in time. Finally, for the educa- 
tional achievements, the displayed estimates confirm that more endowed children get higher 
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Table 2 Parameter estimates and standard errors(se) for sons and daughters 





Daus 


;hters 


Sons 




coeff 


se 


coeff 


se 


0411 


-1.4133 


0.3007 


-1.3860 


0.2721 


0412 


-2.7674 


0.3951 


-2.2284 


0.2546 


0511 


-2.5027 


0.2776 


-2.4058 


0.2285 


0512 


-0.7036 


0.1750 


-0.6301 


0.1437 


0611 


-2.905 1 


0.3717 


-5.7254 


0.5329 


&12 


-0.2087 


0.2082 


-0.4647 


0.1749 


foil 


1.2421 


0.6426 


0.1621 


0.5733 


0812 


2.8051 


0.5843 


1.4549 


0.4474 


&41 


-0.5668 


0.2332 


-0.8648 


0.1737 


0852 


-0.6575 


0.2957 


-0.8067 


0.2468 


0862 


-0.0092 


0.3903 


-0.6447 


0.4554 


0981 


-2.8296 


0.1876 


-2.8735 


0.2047 


0982 


-2.6018 


0.1278 


-2.5633 


0.1204 


010.81 


-3.1387 


0.2905 


-3.0296 


0.2333 


010.82 


-2.0644 


0.2211 


-1.5633 


0.2147 


010.91 


0.3476 


0.1984 


0.5046 


0.1842 


014.51 


0.1585 


0.1491 


-0.0599 


0.1422 


014.52 


0.3622 


0.2009 


-0.1078 


0.2000 


014.61 


0.0818 


0.1539 


0.3573 


0.1464 


014.62 


0.1118 


0.2050 


0.6453 


0.2129 


014.81 


-1.8843 


0.1831 


-3.0141 


0.2135 


014.82 


-2.8394 


0.2107 


-1.7979 


0.2177 



achievements. The parameter estimates for the association with the education of the parents 
are less obvious to interpret: essentially we see that while the association with the education 
of the father is positive and significant for the son, the association for the daughter is close 
to and smaller than the association with the education of the mother. These results may be 
interpreted as indicating a possible gender (or role) effect which may act either as pressure 
from the related parent or as an effort of emulation. A more specific interpretation of these 
results within the context of causal inference is described in the next section. 



6.2 Estimated direct effects 

The results are presented in Table [3l where estimates for sons and daughters are considered 
separately. The comparisons are expressed as ratios of survival probabilities, so, for example 
the upper-left value of 1.3640 says that the probability that a girl reaches education level at 
least 1 when R = (1, 1, 1) is 1.3640 times larger than when R = (0, 0, 0). The effect of 
parents' education is calculated excluding the income path, so it includes the indirect effect 
exerted via child's endowment. 



Dags and finite mixtures 
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Table 3 Causal effects on S° 
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S° >2 
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from min to max 


1.3640 


1.8120 


1.7379 


2.7337 


Separately for the three components 






R 1 from to 1 


1.0398 


1.0733 


1.0780 


1.1263 


R' 1 from to 1 


1.1761 


1.3554 


1.2824 


1.5182 


i? 16 from to 1 


1.1023 


1.1943 


1.2373 


1.4215 


Mother's schooling S"' 








from to 1 


0.9383 


0.9144 


0.9893 


0.9699 


from 1 to 2 


1.2005 


1.5480 


1.1840 


1.2741 


from min to max 


1.1264 


1.4155 


1.1713 


1.2357 


Father's schooling S' 








from to 1 


1.0593 


1.1330 


1.1746 


1.4384 


from 1 to 2 


1.0259 


1.0748 


1.3195 


1.9446 


from min to max 


1.0867 


1.2177 


1.5499 


2.7971 


Income Y 










from to 1 


1.0275 


1.0773 


1.0071 


1.0178 


from 1 to 2 


1.0546 


1.1596 


1.0720 


1.1844 


from min to max 


1.0836 


1.2493 


1.0796 


1.2055 
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